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; Abstract 

In a social network, agents are intelligent and have the capability to make decisions to maximize their 

<^ ■ utilities. They can either make wise decisions by taking advantages of other agents' experiences through 

00 I 

QQ _ learning, or make decisions earlier to avoid competitions from huge crowds. Both these two effects, 

■ social learning and negative network externality, play important roles in the decision process of an agent. 
While there are existing works on either social learning or negative network externality, a general study 
on considering both these two contradictory effects is still limited. We find that the Chinese restaurant 
process, a popular random process, provides a well-defined structure to model the decision process of 

■ an agent under these two effects. By introducing the strategic behavior into the non-strategic Chinese 
5^ . restaurant process, in Part I of this two-part paper, we propose a new game, called Chinese Restaurant 

Game, to formulate the social learning problem with negative network externality. Through analyzing the 
proposed Chinese restaurant game, we derive the optimal strategy of each agent and provide a recursive 
method to achieve the optimal strategy. How social learning and negative network externality influence 
each other under various settings is also studied through simulations. 
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I. Introduction 

How agents in a network learn and make decisions is an important issue in numerous research fields, 
such as social learning in social networks, machine learning with communications among devices, and 
cognitive adaptation in cognitive radio networks. Agents make decisions in a network in order to achieve 
certain objectives. For example, one customer goes to the supermarket for an orange juice. He may need 
to choose one from dozens of brands. However, the agent's knowledge on the market may be very limited 
due to the limited ability in observations or the external uncertainty in the market, which means that the 
customer may not know the quality of all orange juice in different brands. This limitation reduces the 
accuracy of the agent's decision for his objective, e.g., to get the best orange juice of his taste. 

The limited knowledge of one agent can be expanded through learning. One agent may learn from 
some information sources, such as the decisions of other agents, the advertisements from some brands, or 
his experience in previous purchases. All the information can help the agent to construct a belief, which 
is mostly probabilistic, on the unknown state. In most cases, the accuracy of the agent's decision can be 
greatly enhanced by taking into account the belief. A general learning and decision making process in a 
network can be described as follows. First, an agent collects information through available communication 
or observation methods and updates his belief on the uncertain states based on the collected information. 
Then, the agent estimates the expected rewards of certain actions according to the belief he constructed. 
Finally, the agent chooses the action that maximizes his reward. 

Let us consider a social network in an uncertain system state. The state has an impact on the agents' 
rewards. When the impact is differential, i.e., one action results in a higher reward than other actions in 
one state but not in all states, the state information becomes critical for one agent to make the correct 
decision. In most social learning literatures, the state information is unknown to agents. Nevertheless, some 
signals related to the system state are revealed to the agents. These signals may be preserved in private 
or revealed to others. Then, the agents make their decisions sequentially, while their actions/signals may 
be fully or partially observed by other agents. Most of existing works Hl-lSl study how the believes of 
agents are formed through learning in the sequential decision process, and how accurate the believes will 
be when more information is revealed. One popular assumption in traditional social learning literatures is 
that there is no network externality, i.e., the actions of subsequent agents do not influent the reward of the 
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former agents. In such a case, agents will make their decisions purely based on their own believes without 
considering the actions of subsequent agents. This assumption greatly limits the potential applications of 
these existing works. 

The network externality, i.e., the influence of other agents' behaviors on one agent's reward, is a 
classic topic in economics. How the relations of agents influence an agent's behavior is one of the major 
problems in coordinate game theory lH. When the network externality is positive, the problem can be 
modeled as a coordination game, where agents seek the best common decisions to cooperate with others. 
When the externality is negative, it becomes an anti-coordination game, where agents try to avoid making 
the same decisions with others ll6l-|[8l. 

In the literature, there are some works on combining the positive network externality with social 
learning, such as voting game llQl- lfTTI and investment game l[T2l - |[T5l . In the voting game, an election 
with several candidates is hold, where voters have their own preferences on the candidates. The preference 
of a voter on the candidates is constructed by the voter's belief on how the candidates can benefit him if 
winning the election. However, since the candidate can make efforts only when he wins the election, a 
voter's vote depends not only on his own preference but also on the probability that the candidate wins 
the election. In such a case, the estimation and prediction on the decisions of other voters become critical 
in the voting game. A learning process is involved when the voting game is sequential, i.e., voters vote 
the candidates sequentially and the vote of each voter is known by others. In the sequential voting game, 
voters learn from the previous votes to update their believes on the candidates and the probability that 
the candidates win the election. 

In the investment game, there are multiple projects and investors, where each project has different 
probability of success and different payoff. One investor may invest one or several projects if his budget 
allows. If the project succeeds, he receives a payoff from the project. When more investors invest in the 
same project, the succeeding probability of the project increases, which benefits all investors investing 
this project. Note that in both voting and investment games, the agent's decision has a positive effect on 
ones' decisions. When one agent makes a decision, the subsequent agents are encouraged to make the 
same decision in two aspects: the probability that this action has the positive outcome increases due to 
this agent's decision, and the potential reward of this action may be significantly large according to the 



February 14, 2012 



DRAFT 



4 

belief of this agent. 

The combination of negative network externality with social learning, on the other hand, is difficult to 
analyze. When the network externality is negative, the game becomes an anti-coordination game, where 
one agent seeks the strategy that differs from others' to maximize his own reward. Nevertheless, in such 
a scenario, the agent's decision also contains some information about his belief on the uncertain system 
state, which can be learned by subsequent agents through social learning algorithms. Thus, subsequent 
agents may then realize that his choice is better than others, and make the same decision with the agent. 
Since the network externality is negative, the information leaked by the agent's decision may impair the 
reward the agent can obtain in the game. Therefore, rational agents should take into account the possible 
reactions of subsequent players to maximize their own rewards. 

The negative network externality plays an important rule in many applications in different research 
fields, such as spectrum access in cognitive radio, storage service selection in cloud computing, and deal 
selection on Groupon in online social networking. In spectrum access problem, for instance, secondary 
users access the same spectrum need to share with each others. The more secondary users access the 
same channel, the less available access time for each of them. In storage service selection problem, 
the reliability and availability are affected by the number of subscribers. The more subscribers using 
the same service, the lower the service quality of the cloud storage platform. For the deal selection on 
Groupon website, some businesses may receive overwhelming number of customers under the discounted 
deal. The overwhelming number of customers has a negative network externality on the quality of the 
products. In these examples, the negative network externality degrades the utility of the agents making 
the same decision. Therefore, the agents should take into account the possibility of degraded utility, e.g., 
less access time, lower reliability, or lower service quality, when making the decisions. 

The aforementioned social learning approaches are mostly strategic, where agents are considered as 
players with bounded or unbounded rationaUty in maximizing their own rewards. Machine learning, which 
is another class of approaches for the learning problem, focuses on designing algorithms for making use 
of the past experience to improve the performance of similar tasks in the future |[I6|. Generally there 
exists some training data and the devices follow a learning method designed by the system designer to 
learn and improve the performance of some specific tasks. Most learning approaches studied in machine 
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learning are non-strategic without the rationality on considering their own benefit. Such non-strategic 
learning approaches may not be applicable to the scenario where devices are rational and intelligent 
enough to choose actions to maximize their own benefits instead of following the rule designed by the 
system designer. 

Chinese restaurant process, which is introduced in non-parametric learning methods in machine learning 
IITtI . provides an interesting non-strategic learning method for unbounded number of objects. In Chinese 
restaurant process, there exists infinite number of tables, where each table has infinite number of seats. 
There are infinite number of customers entering the restaurant sequentially. When one customer enters 
the restaurant, he can choose either to share the table with other customers or to open a new table, with 
the probability being predefined by the process. Generally, if a table is occupied by more customers, 
then a new customer is more likely to join the table, and the probability that a customer opens a new 
table can be controlled by a parameter ifTSl . This process provides a systematic method to construct the 
parameters for modeling unknown distributions. 

By introducing the strategic behavior into the non-strategic Chinese restaurant process, we proposed 
a new game, called Chinese Restaurant Game, to formulate the social learning problem with nega- 
tive network externality. Let us consider a Chinese restaurant with K tables. There are N customers 
sequentially requesting for seats from these K tables for having their meals. One customer may request 
one of the tables in number. After requesting, he will be seating in the table he requested. We assume 
that all customers are rational, i.e., they prefer bigger space for a comfortable dining experience. Thus, 
one may be delighted if he has a bigger table. However, since all tables are available to all customers, 
he may need to share the table with others if multiple customers request for the same table. In such a 
case, the customer's dining space reduces, due to which the dining experience is impaired. Therefore, the 
key issue in the proposed Chinese restaurant game is how the customers choose the tables to enhance 
their own dining experience. This model involves the negative network externality since the customer's 
dining experience is impaired when others share the same table with him. Moreover, when the table 
size is unknown to the customers, but each of them receives some signals related to the table size, this 
game involves the learning process if customers can observe previous actions or signals. Such a theoretic 
Chinese restaurant game framework is very general and can be applied into many research areas, such as 
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online social networks, wireless communication, and cloud computing, which will be discussed in Part 
II of this two-part paper |[T9ll . 

In the rest of the paper, we first provide detailed descriptions on the system model of Chinese restaurant 
game in Section Hill Then, we analyze the simultaneous game model to show how customers behave given 
the perfect knowledge on the table size in Section JV] Next, we study the sequential game model with 
perfect information to illustrate the advantage of playing first in Section |Vl In Section |Vll we show 
the general Chinese restaurant game framework by analyzing the learning behaviors of customers under 
the negative network externality and uncertain system state. We provide a recursive method to construct 
the best response for customers, and discuss the simulation results in Section IVIII Finally, we draw 
conclusions in Section IVIII I 

II. Related Works 

A closely -related strategic game model to our work is the global game ll20l . ||2TI . In the global game, all 
agents, with limited knowledge on the system state and information hold by other agents, make decisions 
simultaneously. The agent's reward in the game is determined by the system state and the number of 
agents making the same decision with him. The influence may be positive or negative depending on 
the type of network externality. An important characteristics of global game is that the equilibrium is 
unique, which simplifies the discussion on the possible outcome of the game. It draws great attentions in 
various research fields, such as financial crisis |[22l . sensor networks |[23| and cognitive radio networks 
ll24l . Since all players in the global game make decisions simultaneously, there is no learning involved 
in the global game. 

In recent years, several works |[T3l . |[T4l . |[25l - |[27l make efforts to introduce the learning and signaling 
into the global game. Dasgupta's first attempt was investigating a binary investment model, while one 
project will succeed only when enough number of agents invest in the project in fT3l . Then, Dasgupta 
studied a two-period dynamic global game, where the agents have the options to delay their decisions in 
order to have better private information of the unknown state in |[T4l . 

Angeletos et. al. studied a specific dynamic global game called regime of changes ll25l . ll26l . In the 
regime of changes game, each agent may propose an attack to the status quo, i.e., the current politic 
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State of the society. When the collected attacks are large enough, the status quo is abandoned and all 
attackers receive positive payoffs. If the status quo does not change, the attackers receive negative payoffs. 
Angeletos et. al. first studied a signaling model with signals at the beginning of the game in |[25l . Then, 
they proposed a multiple stages dynamic game to study the learning behaviors of agents in the regime 
of change game in |[26l . 

Costain provided a more general dynamic global game with an unknown binary state and a general 
utility function in [27]. The utility function includes information revelation, strategic complementarities, 
and payoff heterogeneity. To simplify the analysis, the positions of the agents in the game are assumed 
to be unknown. Nevertheless, most of these works study the multiplicity of equilibria in dynamic global 
game with simplified models, such as binary state or binary investment model. Moreover, the network 
externality they considered in their models are mostly positive. By proposing the Chinese restaurant 
game, we hereby provides a more general game-theoretic framework on studying the social learning in 
a network with negative network externality, which has many applications in various research fields. 

III. System Model 

Let us consider a Chinese restaurant with K tables numbered 1,2, ...,K and N customers labeled with 
1,2, ...,N. Each customer requests for one table for having a meal. There may be multiple customers 
request for the same table. Each table has infinite seats, but may be in different size. We model the 
table sizes of a restaurant with two components: the restaurant state 6 and the table size functions 
{Ri{9), R2{9), Rk{9)}. The state 9 represents an objective parameter, which may be changed when the 
restaurant is remodeled. The table size function Rj{9) is fixed, i.e., the functions {Ri{9), R2{9), Rk{9)} 
will be the same every time the restaurant is remodeled. An example of 9 is the order of existing tables. 
Suppose that the restaurant has two tables, one is of size L and the other is of size S. Then, the owner may 
choose to number the large one as table 1, and the small one as table 2. The decision on the numbering 
can be modeled as 6* G {1,2}, while the table size functions Ri{9) and R2{9) are given as i?i(l) = L, 
Ri{2) = S, and i?2(l) = •S', -^2(2) = L. Let 6 be the set of all possible state of the restaurant. In this 
example, = {1,2}. 

We formulate the table selection problem as a game, called Chinese Restaurant Game. We first denote 
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X = {1, K} as the action set (tables) that a customer may choose, where Xi G A means that customer i 
chooses the table Xi for a seat. Then, the utility function of customer i is given by U{Rxi, where 
is the number of customers choosing table Xj. According to our previous discussion, the utility function 
should be an increasing function of Rx^, and a decreasing function of Ux-. Note that the decreasing 
characteristic of U{Rx^,nxi) over Ux^ can be regarded as the negative network externality effect since 
the degradation of the utility is due to the joining of other customers. Finally, let n = {ni,n2, ...,n/^} 
be the numbers of customers on the K tables, i.e., the grouping of customers in the restaurant. 

As mentioned above, the restaurant is in a state 6 £ 0. However, customers may not know the exact 
state 9, i.e., they may not know the exact size of each table before requesting. Instead, they may have 
received some advertisements or gathered some reviews about the restaurant. The information can be 
treated as some kinds of signals related to the true state of the restaurant. In such a case, they can 
estimate 9 through the available information, i.e., the information they know and/or gather in the game 
process. Therefore, we assume that all customers know the prior distribution of the state information 9, 
which is denoted as go = {go,i\go,i = Pr{6 = I), Ml G G}. The signal each customer received Sj € 5 is 
generated from a predefined distribution f{s\9). 

A. Belief on State 

In this subsection, we introduce the concept of belief to describe how the customers estimate the 
system state 9. Since customers may make decisions sequentially, it is possible that the customers who 
make decisions later learn the signals from those customers who make decisions first. Let us denote the 
signals customer i learned, excluding his own signal s/, as hi = {s}. With the help of these signals hi, 
his own signal Si, the prior distribution go, and the conditional distribution f{s\9), each customer i can 
estimate the current system state in probability with the belief being defined as 

gi = {9iAgi,i = Pr{9 = l\hi,Si,goJ), V/ G 6} Vi € N. (1) 

According to the above definition, gi^i represents the probability that system state 9 is equal to I 
conditioning on the collected signals hi, received signal Si, the prior probability go, and the conditional 
distribution f{s\9). Notice that in the social learning literature, the belief can be obtained through either 
non-Bayesian naive updating rule HI, Q or fully rational Bayesian rule Q. For the non-Bayesian naive 
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updating rule, it is implicitly based on the assumption that customers are only limited rational and follows 

some predefined rules to compute their believes. Their capability to maximize their utilities is limited not 

only by the game structure and learned information, but also by the non-Bayesian naive updating rules. 

In the fully rational Bayesian rule, customers are fully rational and have the potential to optimize their 

actions without the restriction on the fixed belief updating rule. Since the customers we considered here 

are fully rational, they will follow the Bayesian rule to update their believes as follows: 

^ go,iPr{hi,Si\e = l) 
Eree^^o,P^'r-(hi,s,|0 = /')' 

Notice that the exact expression for belief updating depends on how the signals are generated and learned, 
which is generally affected by the conditional distribution f{s\6) and the game structure. 

IV. Simultaneous Game with Perfect Signal: How Negative Network Externality 

Affects 

The first game structure we would like to discuss is the simultaneous game, in which all customers make 
decisions simultaneously, e.g., all agents arrive the restaurant at the same time. In such a scenario, there 
is no learning involved in the game since customers request the tables at the same time. By investigating 
this game model, we can have an initial understanding on how customers behave in the game. 

We start with a simple case where there are only two customers and two tables. In such a case, there 
are two possible system states, G = {^i, 62}, indicating which table is larger. When the system state is 61, 
RiiGi) = L and -^2(^1) = S where L > S. On the other hand, if the system state is 62, then i?i(^2) = L 
and i?2(^2) = S. Moreover, in such a scenario, the signal is assumed to perfectly reveal the system state 
and indicate the exact amount of resource in each pool, e.g., S = {si,S2}, f{si\9 = 9i) = 1, and 
f{s2\0 = 62) = 1. Under such a signal structure, a customer can immediately know what the system 
state is when receiving the signal. The customer also knows that his opponent has the perfect signal 
of the true system state, i.e., the opponent also knows the exact size of each table. With such a simple 
setting, we temporally remove the uncertainty on the state to see how customers make decisions given 
the network externality. 

Given the opponent's decision, which may be the larger table or smaller table, a rational customer 
should choose the table that can maximize his utility. If the decision made by the opponent is the smaller 
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one, then the customer's best choice is the larger one since U {L, I) > U {S, 1). However, if the opponent 
chooses the larger table, the customer's choice depends on how severe the negative network externality 
is. If U{L,2) > U{S,l), the customer will choose the same larger table. Otherwise, the smaller table 
will be chosen. The result shows that even both customers have already learned the true system state, i.e., 
which table is larger, they may not both choose the larger one in the equilibrium. Instead, the equilibrium 
depends on how severe the negative network externality is. If the externality results in an unacceptable 
penalty, then customers should choose different tables to avoid it. 

A. Best Response of Customers Under Perfect Signal 

Now let us consider the general scenario where there are N customers, K tables, and L possible states 
= {9i, ...,9l}- Here, we consider the perfect signals case, i.e., the system state 9 and the sizes of 
tables Ri{9), R2{6), ...,Rk{9) are known by all customers. The imperfect signal case will be discussed 
in Section |Vl] Since the customers are rational, their objectives in this game are to maximize their own 
utilities. However, since their utilities are determined by not only their own actions but also others', the 
customers' behaviors in the game are influenced by each other. 

A strategy describes how a player will play given any possible situations in the game. In the simul- 
taneous Chinese restaurant game, the customer's strategy should be a mapping from other customers' 
table selections to his own table selection. Recalling that nj stands for the number of customers choosing 
table j. Let us denote n„i = {n_j^i,n_j 2, '^-i.i^:} with n__j j being the number of customers except 
customer i choosing table j. Then, given n„i, a rational customer i should choose the action as 

BEi (n_i , 0) = arg max U{R^ {9) , n_i,^ + 1) . (3) 

The ^ describes a special set of strategy called best response, which represents the optimal action of 
a customer that maximizes the utility given other customers' actions. In the following, we give a formal 
definition of best response. 

Definition 1. Considering a game with players, each with an action space Ai and a utility function 
Ui{xi, x-i), where Xi is player i's action and x_j is the actions of all players except player i. The best 
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response of customer i is 



BEi{x_i) = arg maxtij(x, 



(4) 



B. Nash Equilibrium Under Perfect Signal 

Nash equilibrium is a popular concept for predicting the outcome of a game with rational customers. 
Informally speaking, Nash equilibrium is an action profile, where each customer's action is the best 
response to other customers' actions in the profile. Since all customers use their best responses, none of 
them have the incentive to deviate from their actions. A formal definition of Nash equilibrium for the 
simultaneous game is given as follows. 



Definition 2 (Nash Equilibrum). Nash equilibrium is the action profile x* = {x^, Xg, 2;^} where 
Vi G N, BEi{x*_i) = X*. 



According to the definition of Nash equilibrium, the sufficient and necessary condition of Nash 
equilibrium in the simultaneous Chinese restaurant game is stated in the following theorem. 

Tlieorem 1. Given the customer set {1, ...,iV}, the table set {1, ...^K}, and the current system state 6, 
for any Nash equilibrium of the simultaneous Chinese restaurant game with perfect signed, its equilibrium 
grouping n* should satisfy the following conditions 



• Sufficient condition: suppose that the action profile of all players is x = {xi, ...,XAr} and such an 
action profile leads to the grouping n* = {n\, ...,n^} that satisfies Without loss of generality, 
let us assume that customer i chooses table j, i.e., Xj = j, then we have 



U{RM,nl) > U{Ry{9),nl + 1), if < > 0,Vx,2/ G {1,...,K}. 



(5) 



Proof: 



Ui{xi,x-i) = U{Rj{6),n*). 



(6) 



If customer i chooses any other table k / j, i.e., x ■ = /c / Xj = j, then his utility becomes 



u',ix[,x_i) = UiRkie),nl + l). 



(7) 
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Since U{Rj{d),n*) > U{Rk{0),nl + l),Vj,A;, we have BEi{x-i) = Xi,Vi. Therefore, a = 
{xi, ...jX/vt} is a Nash equiUbrium. 
• Necessary condition: suppose that the Nash equilibrium x* = {xj, X2, x|^} leads to the grouping 
n* = {n\, ...,n*j^}. Without loss of generality, let us assume that customer i chooses table j, i.e., 
x| = j, then we have 

Ui{x*,x*_i) = U{Rj{e), n*) and n* > 0. (8) 
If customer i chooses any other table k ^ j, i.e., x'- = k ^ x* = j, then his utility becomes 

u',ix'i,xU) = U{Rk{e),nl + l). (9) 
Since x* = {xJ,X2, ...,x|^} is a Nash equilibrium, we have BEi{x*_j) = x*, Vi, i.e., 

U{Rj{e),n*)>U{Rk{e),nl + l), n* > 0, Vj, A: G {1, if}. (10) 

■ 

From Theorem [T] we can see that, at Nash equilibrium, one customer's utility would never become 
higher by deviating to another table. Moreover, any deviation to another table will degrade the utility of 
all customers in that table due to the negative network externality. The ^ also implies that customers 
may eventually have different utilities even the tables they choose have the same size. A simple example 
would be a three-customer restaurant with two tables in exact same size. Since there are three customers, 
at the Nash equilibrium, one of the table must be chosen by two customers while the other table is 
occupied only by one customer. 

C. Uniqueness of Equilibrium Grouping 

Obviously, there will be more than one Nash equilibrium since we can always exchange the actions 
of any two customers in one Nash equilibrium to build a new Nash equilibrium without violating the 
sufficient and necessary condition shown in Nevertheless, the equilibrium grouping n* may be unique 
as stated in the following Theorem. 

Theorem 2. There exists a Nash equilibrium in the simultaneous game with perfect signal. If the inequality 
in (121) strictly holds for all x,y S {1, ...,K}, then the equilibrium grouping n* = {nj, ...,n^} is unique. 
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Proof: Since the signals are prefect, the current system state 9 is known by all customers. In 
the following, we first propose a greedy algorithm to construct a Nash equilibrium and then show the 
uniqueness of the equilibrium grouping when the inequality strictly holds. Since exchanging the actions of 
any two customers at the Nash equilibrium will lead to another Nash equilibrium, the Nash equilibrium is 
generally not unique. Without loss of generality, in the proposed greedy algorithm, the customers choose 
their actions sequentially with customer i being the i — th customer choosing the action. We let customers 
choose their actions in the myopic way, i.e., they choose the tables that can maximize their current utilities 
purely based on what they have observed. Let rii = {nj i,nj 2, with XljLi '^ij = « — 1 be the 

grouping observed by customer i. Then, customer i will choose the myopic action given by 

BE'^yopic^^.^Q-^ = argmaxC/(i?,(^),ni,, + 1). (11) 

Let X* = {xj, x^} be the output action set of the proposed greedy algorithm and n* = 

{n^, 7^2, n}^} be the corresponding grouping. For any table j with n* > 0, suppose customer k 
is the last customer choosing table j. Then, we have 

U{R,{e),nk,j + 1) > U{Rjie),nk,j' + l),Vj' G {1, ...,K}. (12) 

Since customer k is the last customer choosing table j, we have n* = n^j + 1 and rij, > Ukj'- Then, 
according to (fT2l ). Vj' G {1, K}, we have 

U{Rj{e),n*) = U{Rj{e),nk,j + 1) > U{Rj,{e),nkj. + 1) > UiRj>{e),n*, + 1), (13) 

where the last inequality comes from the fact that [/(•) is a decreasing function in terms of n. 

Note that ^ holds for all G {!,..., K} with n* > 0, i.e., U{Rj{e),n*) > U{Rj.{9),n*, + 
1)) ^iij' £ {^T--^K} with n* > 0. According to Theorem [T] and (fT3] ). the output action set x* = 
{x^jXj, ...,x*j^} from the proposed greedy algorithm is a Nash equilibrium. 

Next, we would like to prove by contradiction that if the inequality in Q strictly holds, the equilib- 
rium grouping n* = {n\, ...,n]^] is unique. Suppose that there exists another Nash equilibrium with 
equilibrium goruping n' = {n'^^, n'^-}, where n'j ^ n* for some j G {!,..., ET}. Since both n* and n' 
are equilibrium groupings, we have '}2f=i = Si=i ~ ^ ■ ^^^^ a case, there exists two table x 
and y with n'^ > n* and n'y < n*. Then, since n* is an equilibrium grouping, we have 

U{Ry{e),nl)>U{RM,< + ^)- (14) 
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Since n'^ > n%, n' < n* and [/(•) is a deceasing function of n, we have 



U{R,i9),nl) > U{RM,< + 1) > U{R^{9),n',) 



(15) 



and 



u{Ry{e)y\ > u{Ry{e),n'+i) > u{Ry{e),n*)- 



(16) 



Since n' is also an equilibrium grouping, we have 



U{R,{e),n'^)>U{Ry{e),n'y + l). 



(17) 



According to (031) . (fT6l ). and (fTTl ) we have 



U{R.M,< + 1) > > C/(i2j,(e),n; + 1) > U{Ry{e),nl), 



(18) 



which contradicts with ([T4l ). Therefore, the equilibrium grouping n* is unique when the inequality in ^ 



V. Sequential Game with Perfect Signal: The Advantage of Playing First 

In the previous section, we have studied how negative network externahty affects the action of each 
customer and found that a balance will finally be achieved among tables such that there will be no 
overwhelming requests for one table. However, we also find that some customers may have higher 
utilities at the Nash equilibrium. In this section, we extend the Chinese restaurant game into a sequential 
game model, where customers choose their actions in a pre-determined order. We assume that every 
customer can observe all the actions chosen before him, but cannot change the action once chosen. 

In this sequential Chinese restaurant game, customers make decisions sequentially with a predetermined 
order known by all customers, e.g., waiting in a line of the queue outside of the restaurant. Without loss 
of generality, in the rest of this paper, we assume the order is the same as the customer's number, i.e., 
the order of customer i is i. We assume every customer knows the decisions of the customers who 
make decisions before him, i.e., customer i knows the decisions of customers {1, — 1}. Let ni = 
{nj^i, ?ii^25 •••) "-i.A"} be the current grouping, i.e., the number of customers choosing table {l,2,...,i^} 
before customer i. The ni roughly represents how crowded each tables is when customer i enters the 
restaurant. Notice that rii may not be equal to n, which is the final grouping that determines customers' 



strictly holds. 
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Utilities. A table with only few customers may eventually be chosen by many customers at the end of 
the game. The best response function of customer i is 

BErinuO) = argmaxC/(/?,.(0),n,.(ni)), (19) 

X 

where ^^^(ni) denotes the expected number of customers choosing table x given ni. The problem here 
is how to predict the decisions of the remaining customers given the current observation n; and state 9. 

A. Subgame Perfect Nash Equilibrium and Advantage of Playing First 

In this subsection, we will study the possible equilibria of the sequential Chinese restaurant game. In 
particular, we will study the subgame perfect Nash equihbrium. A subgame is a part of the original game. 
In our sequential Chinese restaurant game, any game process begins from player i, given all possible 
actions before player i, could be a subgame. A formal definition of subgame is given as follows. 

Definition 3. A subgame in the sequential Chinese restaurant game is consisted of two elements: 1) 
It begins from customer i; 2) The current grouping before customer i is ni = {nj i, n^^i^:} with 
Ej=l =i-\. 

With the definition of subgame, a subgame perfect Nash equilibrium is defined as follows. 

Definition 4. A Nash equihbrium is a subgame perfect Nash equihbrium if and only if it is a Nash 
equilibrium for any subgame. 

With the concept of subgame perfect Nash equilibrium, we can refine the number of Nash equilibria 
in the original game. We would like to show the subgame perfect Nash equilibrium in the sequential 
Chinese restaurant game by constructing one. Given a subgame, the corresponding equilibrium grouping 
and the best responses of agents in the subgame can be derived through two functions as follows. Let 
EGCX-s, Ng) be the function that generates the equilibrium grouping for a table set Xg and number of 
customers Ns. The equilibrium grouping is given by Q with X being replaced by Xg and being 
replaced by Ng. Notice that Xg could be any subset of the total table set {1, K}, and Ns is less or 
equal to N. We will prove that the output of EG{-) is the corresponding equilibrium grouping in the 
subgame in Lemma IH Then, let PC(Xs, ng, A'^s), where ng denotes the current grouping observed by 
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the customers, be the algorithm that generates the set of available tables given the current grouping ng in 
the subgame. The algorithm removes the tables which have been already over-requested, i.e., the tables 
that already occupied by more than the expected number of customers in the equilibrium grouping. The 
procedures of implementing PC (Kg, ns, Ns) are described as follows: 

1) Initialize: Xq = Xg, Nt = Ns 

2) Xt = Xo, = EGiXt,Nt), Xo = {x\x G Xt, n'^^ > n,,,}, Nt = Ns - ExeXAx„ 

3) If Xo 7^ Xt, go back to step 2. 

4) Output Xo. 

As shown in the following Lemma, the PC(Xs, rig, Ng) will never remove the tables that are best choices 
of the customers. 

Lemma 3. Given a subgame with current grouping ng, current available table set Xg, and the number 
of players Ng, if table j Xg = PC(Xg, rig, Ng), then there exists at least one table j' G Xg such that 
U{R,ie),n'^,)>U{R,{e),ngj). 

Proof- 
Let n* = EG{Ns, Xg). Since table j is removed by PC(Xg, ng, Ng), the inequality Ugj > n* should 
be hold. However, since Ugj > n*, the equilibrium grouping n* is impossible to be reached in the 
Nash equilibrium of this subgame. Assuming the Nash equilibrium of this subgame is n', we have 
n-'j > Ugj > n*, which means ^k£X,,k^j''A < ^k£X,,k^j ''^l- Therefore, there exists a j' € Xg \ {j} 
such that n'j, < 7i*,. 

Since n* is an equilibrium grouping, we have 

U{Rj,{e),n*,) > UiRj{e),n* + 1). (20) 

According to the above discussions, we have 

U{Rj,i6),n'j,) > U{Rj,{e),n*,) > U{Rj{e),n* + 1) > C/(i?j(0), n.j) > U{Rji9),n'j) (21) 

where the the first inequality is due to 7i'j, < n*,, the second inequality is due to (l20l ). and the last two 
inequalities are due to n'- > Ugj > n*. 
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According to (|2T]) . there exists at least one table / that can give the customer a higher utility than 
table J in this subgame. Therefore, table j is never the best response of the customer. ■ 

Now, we propose a method to construct a subgame perfect Nash equilibrium. This equilibrium will 
satisfy the equilibrium grouping in For each customer i, his strategy is described as follows: 

BEf* {nue) = avg max ^ 4'™"'^), (22) 

where 

X''^^"'* = PC(X,ni,iV), (23) 

j^i,cand^^_ ^.^^^ (-24) 

^i.cand ^ EG{X.'''"^'"^, iV*'^""'^). (25) 

In Lemma |4l we show that the above strategy results in the equilibrium grouping in any subgame. 

Lemma 4. Given the available table set Xs = PC(X, ng, N), Ng = N — '}2ix&x\x ''^s,x, the proposed 
strategy shown in i[22]) leads to an equilibrium grouping n* = EG{X.s, Ns). 

Proof: We prove this by contradiction. Let n = {nj\j € X^} be the final grouping after all customers 
choose their tables according to (l22l ). Suppose that n 7^ n* = EG{Xs, Ns), then there exists some tables 
j that Tij > n* J. Let table j be the first table that exceeds Ugj in this sequential subgame. Since nj > n* j, 
there are at least n* ^ + 1 customers choosing table j. Suppose the n* j + 1-th customer choosing table 
j is customer i. Let rii = {nj^i,nj 2, ■■■,ni^K} be the current grouping observed by customer i before he 
chooses the table. Since customer i is the n* ■ + 1-th customer choosing table 7, we have ni j = 7i* ■. 
Since table j is the first table exceeding n* after customer i's choice, we have 

ni^x < n* .j. Vx G Xs. (26) 

According to the definition of PC{-), none of the tables will be removed from candidates. Thus, 

Xi,cand ^ Xs and Ar*.^""'^ = Ns. We have 

j^i,cand ^ ^Q^y^,c^^d^x^^cand^ ^ EG{Xs,Ns) = n*. (27) 
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However, according to (l22l ). the customer i should not choose table j since riij = j = y^*'^""'^. This 
contradicts with our assumption that customer i is the n* ^ + 1-th customer choosing table j. Thus, the 
strategy (l22l ) should lead to the equilibrium grouping n* = EG{Xs, Ns). ■ 

Note that Lemma|4]also shows that the final grouping of the sequential game should be n* = EG{X, N) 
if all customers follow the proposed strategy in (l22l ). With Lemma IH we show the existence of subgame 
perfect Nash equilibrium with the following Theorem. 

Theorem 5. Given customer set {1,...,A^}, table set X = {1,...,K}, and the current state 9, there 
always exists a subgame perfect Nash equilibrium with the corresponding equilibrium grouping n* = 
{n*, ...,n*j^^} satisfying the conditions in (jJ]). 

Proof: We would like to show that the proposed strategy in (l22l ) forms a Nash equilibrium. Suppose 
customer i chooses table j in his round according to (l22l) . Then, customer f's utility is Ui = U{Rj{6),nj) 
since based on Lemma |4l the equilibrium grouping n* will be reached at the end of the game. 

If customer i is the last customer, i.e, i = N, and chooses another table / ^ j in his round, then his 
utility becomes U{Rj'{9),n*, + 1). However, according to Q, we have 

u* = U{Rj{e),n]) > UiRj.{9),n*, + 1). (28) 

Thus, choosing table j is never worse than choosing table j' for customer A^. 

If that customer i is not the last customer, and he chooses table j' instead of table j in his round. 
Since all customers before customer i follows (l22l ). we have ?ij j < 7i* Mj € X. Otherwise, n* cannot 
be reached, which contradicts with Lemma ID 

If TT-jj/ < n*,, we have nj+ij- < n*-,. In addition, we have ^^^+lJ = ^^^J < Vj € X \ {j'}, since 
other tables are not chosen by customer i. Thus, X'+i'^*^"^ = PC(X, ni+i, iV = X) and iV*.^""'^ = N. 
According to Lemma|4l the final grouping should be n* = EG{X, N). Thus, the new utility of customer 
i becomes u'^ = U{Rj'{9),n*,). However, according to (l22l ). we have 

Ui = U{Rj (9), n*) = arg max U{R,^{9),nl) > U{Rj' {9), 7i*,) = u-. (29) 

a;eX,ni,^<n* 

Thus, choosing table / never gives customer i a higher utility. 
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If riij' = n*,, and the final grouping is n' = {n[,n2, n'j^}. Since customer i chooses table j' when 

Uij' = n*,, we have n'j, > n^+ij' = Uij' + 1 = n*, + 1. Thus, we have 

Ui = U{Rj{e),n*) > U{Rf{e),n*, + 1) > U{Rf{d),n'j,) = u[, Vj' G X, (30) 

where the first inequality comes from the equilibrium grouping condition in and the second inequality 
comes from the fact that U{R, n) is decreasing over n and n'-, > n*, + 1. Thus, under both cases, 
choosing table j' is never better than choosing table j. We conclude that {BEf'^*{-)} in (l22l) forms a 
Nash equilibrium, where the grouping being the equilibrium grouping n*. 

Finally, we show that the proposed strategy forms a Nash equilibrium in every subgame. In Lemma |3j 
we show that if the table j is removed by PC{X, iig, N), it is never the best response of all remaining 
customers. Thus, we only need to consider the remaining table candidates Xg = PC{X,ns, N) in the 
subgame. Then, with Lemma |4l we show that for every possible subgame with corresponding Xg, the 
equilibrium grouping n* = EG{Xs, Ns) will be achieved at the end of the subgame. Moreover, the above 
proof shows that if the equihbrium grouping ng will be achieved at the end of the subgame, BEf'^*{-) is 
the best response function. Therefore, the proposed strategies indeed form a Nash equilibrium in every 
subgame, i.e., we have a subgame perfect Nash equilibrium. ■ 

In the proof of the subgame perfect Nash equilibrium, we observe that sequential game structure 
brings advantages for those customers making decisions at the beginning of the game. According to 
(l22l ). customers who make decisions early can choose the table that provides the largest utility in the 
equilibrium. When the number of customers choosing that table reaches equilibrium number, the second 
best table will be chosen by subsequent customers until it is full again. For the last customer, he has no 
choice but to choose the worst one. 

VI. Imperfect Signal Model: How Learning Evolves 

In Section |Vl we have showed that in the sequential game with perfect signal, customers choosing first 
have the advantages for getting better tables and thus higher utilities. However, such a conclusion may not 
be true when the signals are not perfect. When there are uncertainties on the table sizes, customers who 
arrive first may not choose the right tables, due to which their utilities may be lower. Instead, customers 
who arrive later may eventually have better chances to get the better tables since they can collect more 
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information to make the right decisions. In other words, when signals are not perfect, learning will occur 
and may result in higher utilities for customers choosing later. Therefore, there is a trade-off between 
more choices when playing first and more accurate signals when playing later. In this section, we would 
like to study this trade-off by discussing the imperfect signal model. 

In the imperfect signal model, we assume that the system state 9 £ Q = {1,2, ...,L} is unknown 
to all N customers. The sizes of K tables can be expressed as functions of 9, which are denoted as 
Ri{9),R2{9),...,Rk{9). The prior probabiUty of 9, go = {go,i, go,2, go,K} with c/o,z = Pr{9 = I), is 
assumed to be known by all customers. Moreover, each customer receives a private signal Sj G S, which 
follows a p.d.f f{s\9). Here, we assume f{s\9) is public information to all customers. When conditioning 
on the system state 9, the signals received by the customers are uncorrelated. 

In this sequential Chinese restaurant game with imperfect signal model, the customers make decisions 
sequentially with the decision orders being their numbers. After a customer i made his decision, he cannot 
change his mind in any subsequent time and his decision and signal are revealed to all other customers. 
We assume customers are fully rational, which means that they follow the Bayesian learning method to 
learn the true state and choose their strategies to maximize their own utilities. 

Since signals are revealed sequentially, the customers who make decisions later can collect more 
information for better estimations of the system state. When a new signal is revealed, all customers 
follow the Bayesian rule to update their believes based on their current believes. Derived from we 
have the following Bayesian belief updating function 

9i-i,if{si\0 = 1) 

9i,i = ^ 7r~iQ \ • ^^'^^ 

Based on the updating rule in (|3T]) . customer i can update his belief when a new signal is revealed. 

A. Best Response of Customers 

Since the customers are rational, they will choose the action to maximize their own expected utility 
conditioning on the information they collect. Let ni = {n^ i, 77,4 2, "-1,/^} be the current grouping 
observed by customer i before he chooses the table, where riij is the number of customers choosing 
table j before customer i. Then, let hj = {si, S2, Si-i} be the history of revealed signals before 
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customer i. In such a case, the best response of customer i can be written as 

Xi = BEi{ni, hi, Si) = aigmax E[ui{Rj (9), nj)\ni, hi, Si]. (32) 

j 

From ( [32l ). we can see that when estimating the expected utiUty in the best response function, there 
are two key terms needed to be estimated by the customer: the system state 6 and the final grouping 
n = {ni, 712, nx}- The system state 9 is estimated using the concept of beUef denoted as gi = 
{di,!, 9i,2, gi,L} with gi i = Pr{9 = l\hi,Si). Since the information on the system state 9 in is 
fully revealed by hi, given hi, gi is independent with rii. Therefore, given the customer's belief gi, the 
expected utility of customer i choosing table j becomes 

E[ui{Rj{9),nj)\Yii,hi,Si,Xi = i] = gi^wE[ui{Rj{w),nj)\Yii,hi, Si,Xi = j,9 = w]. (33) 

Note that the decisions of customers i + 1, are unknown to customer i when customer i makes 



the decision. Therefore, a close-form solution to (1331) is generally impossible and impractical. In this 
paper, we purpose a recursive approach to compute the expected utility. 

B. Recursive Form of Best Response 

Let SiiJj+i(ni_|_i, Sj+i) be the best response function of customer i + 1. Then, according to 
i?£'j_i_i(ni_|_i, hi_|_i, Sj+i), the signal space S can be partitioned into subspaces with 

5i+ij(ni+i,hi+i) = {s\s G 5,BSi+i(ni+i,hi+i,s) = j}, Vj G {l,...,K}. (34) 

Based on ( [34l ). we can see that, given rii+i and hi+i, i?£'i+i(ni+i, hi+i, s^+i) = j if and only 
if Sj+i G 'S'i+ij- Therefore, the decision of customer i + 1 can be predicted according to the signal 
distribution f{s\9) given by 

Pr(xi+i = j|ni+i,hi+i) = / f{s)ds. (35) 

J sG5i+i,j(ni+i,hi+i) 

Let US define mj j as the number of customers choosing table j after customer i (including customer 
i himself). Then, we have rij = Uij + rriij, where nj denotes the final number of customers choosing 
table j at the end of the game. Moreover, according to the definition of rriij, we have 

1 + mj+ij, Xi=j; 

(36) 

mj+ij, else. 
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The recursive relation of mj j in (136]) will be used in the following to get the recursive form of the 
best response function. We first derive the recursive form of the distribution of mij, i.e., Pr{mij = 
X\ni, hi, Si, Xi) can be expressed as a function of Pr{miJ^ij = X|ni+i, hi+i, Sj+i, Xj+i = j,9 = 
I), V / € e, < J < If, as follows: 

Pr{mi+ij = X - ljni,hi,Si,Xj,6' = /), Xj = j, 
Pr{mi+ij = Xjrii, hi, Si,Xi, = I), Xi / j, 



Pr{mij = X\ni, hi, Si,Xi,9 = I) = < 



E„e{i,...,i^}/ses,+i_„(ni+i,hi+0^'^^"**+i'J' = ^ - l|ni+i,hi+i,Si+i = s,x,+i = u,e = l)f{s\e = l)ds, 
J2uG{i,...,K} LGS.+i,„(ni+i,hi+i) Pr{mi+ij = X|ni+i, hi+1, = s, Xi+i = u,e = l)f{s\e = l)ds, X, 
where hi+i and rii+i can be obtained using 



hi+i = {hi, Si} and rii+i = {ni+i,i, ni+i^^}, 



(38) 



with 



ni+i,k = < 



(39) 



Tijfc, Otherwise. 

V 

Based on (|37] ). Pr{mij = X\ni, hi, Si, Xi, 6 = I) can be recursively calculated. Therefore, we can 
calculate the expected utility E[ui{Rj (6), nj)\ni, hi, Si] as 

E[ui{Rj{e),nj)\ni,hi,Si] = ^ gi,iPr{r 

lee x=o 



[mij = x\ni, hi. Si, Xi = j,0 = l)ui{Rj{l),nij +x). 



(40) 

Finally, the best response function of customer i can be derived by 

iV-i+l 

BEi{ni,hi,Si) = argmaxy^ ^ gi^iPr{mij = x\ni,hi,Si,Xi = j,6 = l)ui{Rj{l),nij + x). (41) 

With the recursive form, the best response function of all customers can be obtained using backward 
induction. The best response function of the last customer N can be found as 

BEN{nj^,hN,SN) = argmayi'S^gN,iUN{Rjil),nN,j + !)■ (42) 

7 — 

Zee 

Note that Pr{mj\rj = X|nN, hN, sat, xtv, 0) can be easily derived as follows: 

1, ifX7V=j, 

(43) 

0, otherwise. 



Pr{mN,j = l|nN,hN,SAr,3;Ar,6') = < 
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VII. Simulation 

In this section, we verify the proposed recursive best response and corresponding equilibrium. We 
simulate a Chinese restaurant with two tables {1,2} and two possible states 9 € {1,2}. When 9 = 1, 
the size of table 1 is i?i(l) = 100 and the size of table 2 is i?2(l) = 40. When 9 = 2, Ri{2) = 40 
and i?2(2) = 100. The state is uniformly randomly chosen at the beginning of each simulation with 
probability 0.5. The number of customers is fixed. Each customer receives a randomly generated signal 
Si at the beginning of the simulation. While conditioning on the system state, the signals customers 
received are independent. The signal distribution f{s\9) is given by 

Pr{s = 1\9 = 1) = Pr{s = 2\9 = 2) = p , Pr{s = 2\9 = 1) = Pr{s = l\9 = 2) = l-p, (44) 

where p > 0.5 can be regarded as the quality of signals. When the signal quality p is closer to 1, the signal 
is more likely to reflect the true state 9. With the signals, customers make their decisions sequentially. 
After the i-th customer makes his choice, he reveals his decision and signal to other customers. The 
game ends after the last customer made his decision. Then, the utility of the customer i choosing table 
j is given by Ui{Rj{9),nj) = where nj is the number of customers choosing table j at the end of 
the game. 

A. Optimality of Proposed Best Response 

We first verify the optimality of the best response. We study the 5-customer scheme with the same 
settings in the previous simulation. We assume that all customers except customer 2 apply their best 
response strategies while customer 2 chooses the table opposite to his best response strategy with a miss 
probability p™". We assume the signal quality p G [0.5, 1] and p™'' G [0, 1] in the simulations. 

From Fig. |l(a)[ we can see that whenp™'^ increases, the expected utility of customer 2 always decreases 
for any signal quality p. This confirms the optimality of the proposed best response function. We also 
observe that when customer 2 has a positive p"**'', at least one of the other customers will have a better 
average utility. Using the expected utility increase of customer 3 shown in Fig. |l(b)| as an example, when 
customer 2 makes a mistake in choosing the table, customer 3 benefits from the mistake of customer 2 
under most signal qualities. 
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(a) Customer 2 (b) Expected Utility Increase of Customer 3 

Fig. 1. Expected utility of Customers under different Miss Probability of Customer 2 



Table Siie RaLin (r) 





Signal OLallly |p) 



Table Si;e Ratie (r] 



(a) Customer with Largest Expected Utility (b) Utility Difference between 1 and 2 (c) Utility Difference between 1 and 3 

Fig. 2. The effect of different Table Size Ratio and Signal Quality in 3-Customer Restaurant 



B. Advantage of Playing Positions vs. Signal Quality 

Next, we investigate how the decision order and quality of signals affect the utility of customers. We 
follow the same settings in previous simulations except the table sizes. We fix the size of one table as 
100. The size of the other table is r x 100, where r is the ratio of the table sizes. In the simulations, 
we assume the ratio r € [0, 1]. When the ratio r = 1, two tables are identical, but the utility of choosing 
each table may have different utility since we may have odd customers. When r = 0, one table has a 
size of 0, which means a customer has a positive utility only when he chooses the correct table. 

We first simulate a 3-customer scheme. From Fig. |2l we can see that the advantage of customers 
making decisions at different order is significantly affected by both the signal quality and the table size 



ratio. As shown in Fig. 2(a) when the signal quality is high and the table size ratio is low, customer 
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Signal Quality (p) 



Table Size Ratio (r) 




Signal Quality (p) 



Table Size Ratio (r) 



(a) 5 Customers (b) 10 Customers 

Fig. 3. The effect of different Table Size Ratio and Signal Quality in 5- and 10-Customer Restaurants 



3 has the largest expected utility. For other regions, customer 1 has the largest expected utility. This 
phenomenon can be explained as follows. When the ratio is lower than i, all customers desire the larger 
table since even all of them select the larger one, each of them still have a utility larger than choosing the 
smaller one. In such a case, customers who choose late would have advantages since they have collected 
more signals and have a higher probability to identify the large table. 

On one hand, when the signal quality is low, even the third customer cannot form a strong belief on 
the true state. In such a case, the expected size of each table becomes less significantly, and customers' 
decisions rely more on the negative network externality effect, i.e., how crowded of each table. When the 
first two customers choose the same table, customer 3 is more likely to choose the other table to avoid 
the negative network externality. On the other hand, when the signal quality is high, customer 3 is likely 
to form a strong belief on the true state and will choose the table according to the signals he collected. 
Therefore, when signal quality is high and the table size ratio is low, customer 3 has the advantage of 
getting a higher utility in the Chinese restaurant game. 

Nevertheless, due to the complicated game structure in Chinese restaurant game, the effect of signal 
quality and table size ratio is generally non-linear. As shown in Fig. 3(a)[ when the number of customers 
increases to 5, similar to the 3-customer scheme, customer 5 has the largest utility when the signal quality 
is high and the table size ratio is low, while customer 1 has the largest utility when the signal quality is 
low and the table size ratio is high. However, we observe that in some cases, customer 3 becomes the 
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one with largest utility. The reasons behind this phenomenon is as follows. In these cases, we observe 
that the expected number of customers in the larger table is 3, and this table provides the customers a 
larger utility then the other one at the equilibrium. Therefore, customers would try to identify this table 
and choose it according to their own believes. Since customer 3 collects more signals than customer 1 
and 2, he is more likely to choose the correct table. Moreover, since he is the third customer to choose 
a table, this table is always available to him. Therefore, customer 3 has the largest expected utility in 
these cases. 

Note that the expected table size is determined by both the signal quality and the table size ratio. 
Generally, when the signal quality is low, a customer is less likely to construct a strong belief on the true 
state, i.e., the expected table sizes of both tables are similar. This suggests that a lower signal quality has 
a similar effect on the expected table size as a higher table size ratio. Our arguments are supported by 



the concentric-like structure shown in Fig. |3(a)| The same arguments can be applied to the 10-customer 
scheme, which is shown in Fig. |3(b)[ We can observe the similar concentric-like structure. Additionally, 
we observe that when the table size ratio increases, the order of customer who has the largest utility 
in the peaks decreases from 10 to 5. This is consistent with our arguments since when the table size 
ratio increases, the equilibrium number of customers in the large table decreases from 10 to 5. This also 
explains why customer 1 does not have the largest utility when the table size ratio is high. In this case, 
the equilibrium number of customers in the large table is 5, and the large table provides higher utilities 
to customers in the equilibrium. Since customer 5 can collect more signals than previous customers, he 
has better knowledge on the table size than customer 1 to 4. Moreover, since customer 5 is the fifth one 
to choose the table, he always has the opportunity to choose the large table. In such a case, customer 5 
is the one with the largest expected utility when the table size ratio is high. 

Next, we discuss two specific scenarios: the resource pool scenario with r = 0.4 and available/unavailable 
scenarios with r = 0. In resource pool scenario, the table size of the second table is 40. Users act 
sequentially and rationally to choose these two tables to maximize their utilities. In available/unavailable 
scenario, the second table size is 0, which means that a customer has positive utility only when he chooses 
the right table. For both scenarios, we examine the schemes with the number of customers = 3 and 
iV = 5. 
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(a) 5 Customers (b) 3 Customers 

Fig. 4. Average utility of Customers in Resource Pool scenario when r — 0.4 



(c) Best Response when N=3 



From Fig. |4l we can see that in the resource pool scenario with r = 0.4, customer 1 on average has 
significant higher utility, which is consistent with the result in Fig. |3(a)[ Using 5-customer scheme shown 



in Fig. |4(a)| as an example, the advantage of playing first becomes significant when signal quality is 
very low (p < 0.6), or the signal quality is high (p > 0.7). We also find that customer 5 has the lowest 
average utility for most signal quality p. We may have a clearer view on this in the 3-customer scheme. 
We list the best response of customers given the received signals in Fig. |4(c) We observe that when 
signal quality p is large, both customer 1 and 2 follow the signals they received to choose the tables. 
However, customer 3 does not follows his signal if the first two customers choose the same table. Instead, 
customer 3 will choose the table that is still empty. In this case, although customer 3 know which table is 
larger, he does not choose that table since it has been occupied by the first two customers. The network 
externality effect dominates the learning advantage in this case. 

However, when p is low, the best response of customer 1 is opposite, i.e., he will choose the table 
that is indicated as the smaller one by the signal he received. At the first glance, the best response of 
customer 1 seems to be unreasonable. However, such a strategy is indeed customer I's best response 
considering the expected equilibrium in this case. According to Theorem [51 if perfect signals (p = 1) are 
given, the large table should be chosen by customer 1 and 2 since the utility of large table is 100/2 = 50 
is larger than the that of the small table, which is 40/1 = 40, in the equilibrium. However, when the 
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(a) 5 Customers (b) 3 Customers (c) Best Response when A'^ = 3 

Fig. 5. Average utility of Customers in AvailableAJnavailable scenario when r = 



imperfect signals are given, customers choose the tables based on the expected table sizes. When signal 
quality is low, the uncertainty on the table size is large, which leads to similar expected table sizes for 
both tables. In such a case, customer 1 favors the smaller table because it can provide a higher expected 
utility, compared with sharing with another customer in the larger table. 

In the available/unavailable scenario, as shown in Fig. [5j the advantage of customer 1 in playing first 



becomes less significant. Using 5-customer scheme shown in Fig. |5(a)| as an example, when signal quality 
p is larger than 0.6, customer 5 has the largest average utility and customer 1 has smallest average utility. 
Such a phenomenon is because customers should try their best on identifying the available table when 
r = 0. Learning from previous signals gives the later customers a significant advantage in this case. 
Nevertheless, we observe that the best responses of later customers are not necessary always choosing 
the table that is more likely to be available. We use the 3-customer as an illustrative example. We list the 
best response of all customers given the received signals in Fig. 5(c)[ When the signal quality is pretty 
low (p = 0.55), we have the same best response as the one in resource pool scenario, where the network 
externality effect still plays a significant role. Using (si, S2, S3) = (2, 2, 1) as an example, even customer 
3 finds that table 2 is more likely to be available, his best response is still choosing table 1 since table 
2 is already chosen by both customer 1 and 2, and the expected utility of choosing table 1 with only 
himself is higher than that of choosing table 2 with other two customers. As the signal quality p becomes 
high, e.g., p = 0.9, customer 3 will choose the table according to all signals si,S2, S3 he collected. The 
belief constructed by the signals are now strong enough to overcome the loss in the network externality 
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effect, which makes him choose the table that is more Ukely to be available. 

VIII. Conclusion 

In this paper, we proposed a new game, called Chinese Restaurant Game, by combining the strategic 
game-theoretic analysis and non-strategic machine learning technique. The proposed Chinese restaurant 
game can provide a new general framework for analyzing the strategic learning and predicting behaviors 
of rational agents in a social network with negative network externality. By conducting the analysis on the 
proposed game, we derived the optimal strategy for each agent and provided a recursive method to achieve 
the equilibrium. The tradeoff between two contradictory advantages, which are making decisions earlier 
for choosing better tables and making decisions later for learning more accurate believes, is discussed 
through simulations. We found that both the signal quality of the unknown system state and the table size 
ratio affect the expected utilities of customers with different decision orders. Generally, when the signal 
quality is low and the table size ratio is high, the advantage of playing first dominates the benefit from 
learning. On the contrary, when the signal quality is high and the table size ratio is low, the advantage 
of playing later for better knowledge on the true state increases the expected utility of later agents. 
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